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Abstract 

Let A = (ttij) be an n X n random matrix with i.i.d. entries such that Eon = 0 
and Eoii^ = 1. We prove that for any h > 0 there is L > 0 depending only on 
6, and a subset Af of B 2 of cardinality at most exp(hn) such that with probability 
very close to one we have 

U {y + LV^B^). 

yeA{jV) 

In fact, a stronger statement holds true. As an application, we show that for some 
L' > 0 and u € [0,1) depending only on the distribution law of an, the smallest 
singular value Sn of the matrix A satishes P{sn(A) < < L'e + u” for all 

e > 0. The latter result generalizes a theorem of Rudelson and Vershynin which 
was proved for random matrices with subgaussian entries. 


1 Introduction 

In this paper, we consider random matrices A satisfying 

A is n X n] the entries of A are i.i.d., with Eojj = 0, Ea^y = 1. (*) 

We are concerned with the following question: how many translates of a Euclidean ball 
y/nBf (or its constant multiple) are needed to cover the random ellipsoid A(i? 2 )? Being 
geometrically natural, this problem, as we will see later, has an application to studying 
invertibility properties of the matrix A. 

If the entries of A have a bounded fourth moment then the operator norm ||A|| 2^.2 
satishes ||d|| 2^.2 < L^/n with probability close to one (see [31] and [9] for precise state¬ 
ments), whence ¥{A{Blf) C L^/nB^} ~ 1. If, moreover, the entries of A are sub¬ 
gaussian then for some L > 0 depending only on the subgaussian moment we have 
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¥{A{B 2 ) C > 1 — exp(—n). On the other hand, for heavy-tailed entries the op¬ 

erator norm of A may have a higher order of magnitnde compared to y/n with probability 
close to one, so the trivial argnment given above is not applicable. The hrst main resnlt 
of the paper is the following theorem; 

Theorem A. Let 5 G (0,1/4] and n> Then there is a (non-random) collection C 
of parallelepipeds in M"’ with \C\ < exp(13n(51n having the following property: For any 
random matrix A satisfying ([f|, with probability at least 1 — 4exp(—(5n/8) we have 

V a; G Bf 3P G C such that x E P and A{P) C Ax H- '^{—Blf. 

0 

Here, C > 0 is a universal constant. 

In particular, the above theorem implies the following more elegant 

Corollary A. For any 6 G (0,1/4] and n> there exists a non-random subset M C Blf 
of cardinality at most exp(13n51n such that for any n x n matrix A satisfying ([f|, we 
have 

^[A{Bf) c IJ (i/ + > 1 - 4exp(-<5n/8) 

y&A{N) 

for some universal constant C > Q. 

Both results have geometric interpretation in terms of covering numbers. Recall that 
for two subsets S and A of a vector space the covering number N{S,K) is dehned as 
the smallest number of parallel translates of K sufficient to cover S. By Theorem A, 
N(A(R 2 )) < exp(l3(fnln with probability at least 1 — 4exp(—(fn/8). 

Another interpretation of these results, that will be of use for us, is related to the net 
rehnement (see Theorem A* in Section 5). Given a metric space A, an e-net A/" on A is 
a subset of A such that any point of A is within distance at most e from a point of M. It 
is easy to see that with probability at least 1 — 4exp(—(5?7,/8) the set M from Corollary A 
is a ^^^-net on Blf with respect to the pseudometric d(x,y) := \\A{x — 1 /)|] {x,y G Btf). 
Here and further, || • || denotes the standard Euclidean norm in M”. 

A crucial feature of these results is that the set C in the theorem is non-random. 
Moreover, C (as well as the set M from Corollary A) provides a “universal” covering 
which is independent of the distribution of the entries of A. 

Finally, compared to Corollary A, the statement of Theorem A is more flexible as it 
enables us to choose the “anchor” points within the parallelepipeds when constructing 
corresponding e-net (this matter is covered in detail at the beginning of Section [5]). 

Let us briefly describe the main idea of the proof. The collection C of parallelepipeds 
is constructed using a special subset P of diagonal operators with diagonal elements in 
the interval (0,1]. Namely, we dehne P as the set of all diagonal operators with diagonal 
entries in {1}U{2“^ }^q and with determinants bounded from below by exp(—(5n). Then, 
for every operator D from P, we take a covering of the ball Blf by appropriate translates 
of parallelepiped D{L''n~^/‘^B(ff) (for some L” = L"((5)), and let C be the union of such 
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coverings over V. It turns out that Theorem A follows almost immediately from the 
following relation: 


pjd diagonal matrix D with diagonal entries in {1} U {2 such that 

Cn 

det D > exp(—(5n) and || AZi)||oo ^.2 < —;= 

Vs 

In Section [3l we show that ([I]) holds true under condition Q; see Theorem 13.11 Geomet¬ 
rically, this property means that it is possible to construct a random parallelepiped P C 
[—1, !]"■ with sides parallel to the standard coordinate axes, such that Vol(P) > exp(—(5n) 
and A maps P inside the Euclidean ball with probability at least 1 — 4exp(—(5n/8). 
Note that parallelepiped P will be “narrow” along directions w G 5'”“^ for which ||At(;|| 
is large. 


I > 1 — 4exp(—(5?7,/8). 




As we already mentioned above. Theorem A has a direct application to the problem 
of obtaining quantitative (non-asymptotic) estimates for the smallest singular value of A. 
Recall that, given an m x n [m > n) matrix M, its smallest singular value can be dehned 
as Sn{M) = inf ||M|/||. An argument based on Theorem A and results of Rudelson and 

Vershynin from [201 HH], yields: 


Theorem B. For any v G (0,1] and u G (0,1) there are numbers L > 0, m G (0,1) 

and no G N depending only on v and u with the following property: Let n > no and let 

A = (aij) be an n X n random matrix satisfying Q, such that supP{|aii — X\ <v} <u. 

AeK 


Then for any e > 0 we have 


P{sn(A) < < Le + u^. 

Note that any random variable a with Eo = 0 and = 1 obviously satishes 
supP{|Q; — X\ < v} < u for some n > 0 and u G (0,1) determined by the law of a. 

AeK 

Thus, the above statement does not require any additional assumptions on the matrix 
apart from (Q; by introducing the quantities v and u we make the dependence of L and 
u on the law of an more explicit. 

Let us put Theorem B in the context of known results. 

Convergence of (appropriately normalized) smallest singular values for a sequence of 
random rectangular matrices with i.i.d. entries and growing dimensions was established 
by Bai and Yin [3] (see also [27], where the result is proved under optimal moment 
assumptions). For non-asymptotic results in this direction, we refer the reader to papers 
[mun] for the case of i.i.d. entries (see also [28] where no moment conditions are assumed); 
BE] for log-concave distributions of rows and [SHimiHlEnils] for more general isotropic 
distributions. We refer to surveys [TH [29] (see also ini) for more information. 

For random square matrices with independent standard Gaussian entries, the limiting 
distribution of the smallest singular value was computed by Edelman [4]; universality of 
this result was established in m- Further, for matrices with i.i.d. entries it was shown 
in [23] and iza that, given any K > 0 there are R,L>0 depending only on K and 
the law of an such that P{s„(A + B) < n~^} < Rn~^ for any non-random matrix B 
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satisfying ||i?|| 2^.2 < (we note that analogous results were recently obtained for more 
general models of randomness allowing some dependence between the entries of A] see, 
in particular, na and [Ej). In the case B = 0 which we study in this paper, those papers 
do not provide optimal estimates for Sn(A). A much more precise statement was proved 
in [20] under the additional assumption that the entries of A are subgaussian; namely, 
Rudelson and Vershynin showed that s„(A) satishes a small ball probability estimate 

P{s„(A) < < Le + £ > 0, 

where L > 0 and u G (0,1) depend only on the subgaussian moment of aj/s. Note 
that Theorem B gives an estimate of exactly the same form, but for the matrices with 
heavy-tailed entries. 

The idea of the proof of Theorem B can be described as follows. Denote by A' 
the transpose of the hrst n — 1 columns of A. A principal component of the proof of 
[20] is an analysis of the arithmetic structure of null vectors of A', which is described 
with the help of the notion of the least common denominator (LCD). To show that null 
vectors of A' typically have an exponentially large LCD, the authors of [20] consider 
subsets S of the unit sphere corresponding to vectors with small LCD, and show that 
inf ||A'a:|| > 0 with a large probability. For this, they use the standard e-net argument, 

xdS 

when the inhmum is estimated by taking a Euclidean e-net A/" on S' and applying relation 
inf \\A'x\\ > inf \\A'y\\ — e||A'|| 2^2 together with the estimate ||A'|| 2^2 < Cy/n which 

x&S y&N 

holds with probability very close to one under the subgaussian moment assumptions on 
the entries. In our setting, the principal difficulty consists in the fact that the condition 
(El does not guarantee a good upper bound for the operator norm ||A'|| 2 ^. 2 - To deal with 
this fundamental issue, we “rehne” the nets constructed in m by applying Theorem A. 
Indeed, it can be shown that Theorem A implies that, given an e-net J\f on S, it is possible 
to construct a subset A/" C S' of cardinality at most exp(l3(5nln lA/"] which is an L'e^/n- 
net on S (for some L' = L\5)) with respect to the pseudometric d{x,y) = \\A'{x — |/)|| 
with probability at least 1 — 4exp(—(5n/8). Then, inf ||A'a:|| > inf HA'^/H —Bsy/n, so the 

XeS 

argument does not depend any more on the value of ||A'|| 2 ^ 2 - 

The paper is organized as follows: Sections [2] and [3] are devoted to proving the main 
novel element of the paper — Theorem A. Then, in Section 0] we collect some results 
from [20], and, in Section [S] prove Theorem B. 

Finally, let us discuss notation. Given a hnite set S, by |S| we denote its cardinality. 
By ei, 62 ,..., e„ we denote the canonical basis in M"'. The standard inner product in M” 
shall be denoted by (•, •). Given p G [l,C)o], || • ||p is the standard ip-noim. For £ 2 , will 
will simply write || ■ ||. Given an m x n matrix M and p,q & [1, cxd], by ||M||p^q we shall 
denote the operator norm of M considered as the mapping from (M"", || • ||p) to (M™, || • ||q). 
Universal positive constants shall be denoted by C, c. Sometimes, to avoid confusion, we 
shall add a numerical subscript to the name of a constant or function dehned within a 
statement. 
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2 Fitting a random vector into an ^^-ball 

Throughout the paper, by we denote the set of all u x n diagonal matrices with 
diagonal elements belonging to the interval (0,1] (we will sometimes refer to such matrices 
as positive diagonal contractions). Further, denote by the set of all n x n positive 
diagonal contractions whose diagonal entries belong to the set {1} U {2~^ }£o- The set 
can be regarded as a discretization of Vn- 


In this section, we consider the following problem: Let X be a random vector in M” 
with i.i.d. coordinates. We want to hnd a random diagonal operator D taking values in Vn 
such that D{X) is contained in an appropriate (hxed) multiple of the £p-ball everywhere 
on the probability space and at the same time the determinant of D is typically “not too 
small”. The statement to be proved is 


Proposition 2.1. For any a G (0, 1) there is a number L = L{a) > 0 with the following 
property: Let 6 G (0, 1], p G [1, oo) and let X = {xi,X 2 , ■ ■ ■ ,Xn) be a random vector on 
(n, S,P) with i.i.d. coordinates such that < oo. Then there is a random positive 

diagonal contraction D taking values in Vn such that 

||-DX|F < ^E||X|F everywhere on the probability space, and E(det < exp((5). 

Remark 2.2. Proposition 12.11 is a foundation block of our paper. In Section |3l we will 
amplify this result (the case p = 2) by proving its “matrix version” fTheorem 13.ip . The 
case p 7 ^ 2 in this section is considered just for completeness. 

Remark 2.3. Note that a trivial dehnition of the diagonal operator D = (dij) by setting 


d-P 


V ’ ^ ||X|| 


, j = 1,2, ...,n. 


gives an unsatisfactory distribution of the determinant. For example, if the entries of X 
are {0, l}-valued with probability of taking value 1 equal to 1/n, then E||X||(j = 1, and 
for any m < n we have 


P{||X||^ = m} 



\ n—m 


n 


> 


1 

4m™- 


Thus, the above dehnition of D would give P{det H < 2 "-} > ||'2^L/(5] 

Our construction of the required operator is more elaborate. Let us hrst describe 
the idea informally. Assume that p = 1 and that X is our random vector with non¬ 
negative i.i.d. coordinates with unit expectations. We consider a sequence of non-negative 
numbers (levels) such that each coordinate exceeds k-th level with probability 2“^. The 
main observation is that X “does not ht” into the £”-ball only if for some k there 

are much more than 2~^n coordinates of X exceeding the level. We dehne the required 
operator D so that its restriction to the “bad” coordinates is an appropriate dilation, 
while on all other coordinates it acts isometrically. If there exist several “bad” levels the 
operator D will be dehned as a product of several diagonal operators. Moreover, it will be 
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more convenient to “replace” the vector X by a sum of independent vectors of two-valued 
variables, such that the sum is a majorant for X on the entire probability space. We 
construct the majorant in the coupling Lemma [2.51 stated below. 

Given a non-negative random variable ^ with an everywhere continuous cumulative 
distribution function (in particular, = 0 } = 0 ), dehne numbers Tk{^) (levels) as 

TkiO ■= inf{r > 0 : P{e > r} = 2 "^}, k>Q. 

Note that 

CX) 

Ee> 5 ^ 2 -^-V,( 0 . ( 2 ) 

k=0 

We will need the following standard fact: 

Lemma 2.4 (see, for example, [26l Chapter 1, Theorem 3.1]). Let ^ 1,^2 two random 
variables on a probability space (G, S,P), and assume that P{^i > t} > P {^2 > t} for 
all t eM (that is, ^2 is stochastically dominated by fi). Then there is a probability space 
(G, S,P) and random variables ^ 1,^2 on (G, S,P) such that 1) is equidistributed with 
fi, i = 1,2, and 2) > ^2 everywhere on Vt. 

Lemma 2.5 (Coupling). Let Y = {y^, 7 / 2 , ■ ■ ■, 2/n) be a random vector on a probability space 
(C, S,P) with i.i.d. non-negative coordinates with everywhere continuous cdf andMyi = 1. 
Further, let (ff (i <n, k = 11,1,...) be 0-1 variables on (C, S,P) with P{^f = 1} = 2“^, 
and such that are jointly independent for all i < n and k > 0, and set 

00 

Zi-=^Tk+i{yi)^i, i = l,2,...,n, 
k=0 

and Z := {zi, Z 2 ,..., Zn). Then there is a probability space (C, S, P) and random vectors 
Y = {yi, ^ 2 , • • •, Vn) and Z = {zi,Z 2 ,..., Zn) on (C, S, P) such that 

a) Y and Z are equidistributed with Y and Z, respectively; 

b) Zi >]ji for all i < n everywhere on (C, S, P). 

Proof. Fix for a moment i < n and consider the distributions of y^ and Zi. Take any f > 0. 
If 'TkiVi) < t for all /c > 0 then, obviously, 

¥{zi > f} > 0 = ¥{yi > t}. 

Otherwise, let k{t) := max{fc > 0 : Tk{yi) < t}. Then 

¥{zi >t}> > Tk{t)+iiyi)} = = F{yi > Tk(t)iyi)} > ^{yi > t}. 

Thus, yi is stochastically dominated by Zi and, by Lemma 12.41 there is a probability 
space (r 2 i,Sj,Pj) and variables yi and ij on (r 2 i,Sj,Pj) equidistributed with yi and Zi, 
respectively, such that 3 > yi everywhere on 0 *. 

Finally, by taking (C, S, P) to be the product space Cj and naturally extending the 
variables yi,'zi to (C, S,P), we obtain the random vectors Y, Z satisfying the required 
conditions. □ 
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The next lemma provides an actnal construction of the required diagonal operator. 

Lemma 2.6. For any a G (0,1) there is L = L{a) > 0 with the following property. Let 
be an increasing non-negative seguence satisfying < oo, and let 

OO 

k=0 


where ) • • • > ^n) ^ ^ = 0,1,... J are jointly independent 0-1 random 

variables with = 1} = 2“^. Further, let 5 G (0,1]. Then there is a random positive 
contraction D taking values in Fn such that 


T T 

I|5z||,<-e||z||. = ^5;w‘ 

k=0 


everywhere on the probability space, 


Vk ■ = 


andE(detZl)" ^<exp((5). 

Proof. Let L > 2e be a number which we will determine later. Now, for each k > 0, 
dehne random variables 

^k ■= \{i ■ ii ^ 0}| 

and 

X / . \ Uu 

if 5vk > L2“^n; 

1, otherwise. 

As building blocks of the contraction D, let us consider random diagonal matrices 
with 

^ ife? = 0; 

min(l, , otherwise, 

Then detH*^^^ = rik~^ and < L2l^ = .|e||^^||i (deterministically). Note that 

acts as a dilation on the span of {e, : ^ 0} provided that > L2_L!i = ^¥, 0 ^, and 

as an isometry on the orthogonal complement. We construct the required contraction D 
as the product of contractions by setting D := Then 


■= 


j = 1,2, ...,n. 


\\DZ\U<\\j2rk+iD^^k' 


k=0 


Y OO 

IjTX ^ 


—k 


= fE||Z||i. 


Note that 

OO OO 

E(det5)“"' = E 

k=^ fc=0 


Next, for every /c > 0 we have 


< 1 + 

00 

E 1 

r 6m ' 
V L2~^n. 

\ m—am 

j f>{uk=m} 

m= 

--\L2 ^n/SI 




CO 

fe5\'^ / 

L2~^n\<^^ 

< 1 + 

E ( 

.t) ( 

. 6m ) 


m=\L2 ^n/(5] 
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In particular, for all k such that L2 ^n/5 > 1, using the relation L > 2e, we obtain 


Er7fc'-“<l + 2(^) 


e5\\L2-^n/S^ 


and for all k satisfying L2 ^n/6 < 1, we get 


eh 


< 1 + 2—(L2-^n/h)“. 

1j 

Now, let us choose L = L{a) sufficiently large so that both 


E dx 

k: L2~^n/S>l 


e5^\L2->=n/S-] 


lJ 


and 


eh 


5 ^ 2-(L2-'‘n/Sy 


k:L2 ^n/S<l 


are less than 6/2. Then, multiplying the estimates for Krjk^ “, we get 

oo la 

<exp(h). 


k=0 


and the result follows. □ 

Proof of Proposition \2. 1\ Fix admissible a, 6 and p. Without loss of generality, the distri¬ 
bution of the coordinates of the random vector X is continuous on the real line. Indeed, 
otherwise we can replace every coordinate x* with |xj| +Ui, where Ui, U 2 , ■ ■ ■ ,Un are jointly 
independent with xi,X 2 , ■ ■ ■ ,Xn and each Ui is uniformly distributed on [0,0] for a very 
small parameter 0 > 0 chosen so that E(|xj| -|- Ui)^ ~ E|xj|^. Then the random diagonal 
contraction D constructed for the new vector X' := (|xi| -|- ^Iso satisfy the 

required properties with respect to X. ^ ^ _ _ _ 

Set Y := (|xi|^, \x 2 \^, ■ ■ ■, \xn\^) and let Y, Z be random vectors on a space (hi, S,P) 
constructed in Lemma [2.51 with respect to Y. By Lemma [2.61 and in view of relation ([2]), 
we can hnd a random positive contraction Zh on hi taking values in such that for some 
L = L{a) > 0 we have 

ll-DFlIi < ||T)Z||i < ^E||Z||i < ^E||y||i everywhere on kl 
6 6 

and 

E(detD)“-i < exp((5). 

In general, the operator D is not a function of Y, which creates (purely technical) issues 
in dehning corresponding operator on the original space (hi, S,P). For completeness, let 
us describe an elementrary discretization argument resolving the problem: 

Let {Bz} be a partition of into Borel subsets, indexed over 2 ; = (zi, Z 2 , ■ ■ ■, Zn) £ 
(Z U {—cxo})"' and dehned by 

;= {W e : We (2"‘-\ 2^*] for alH = 1, 2,..., n} 
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(we set Wi = 0 for Zi = —oo). Further, for every 2 ; we let 

:= {c5 e : Y{u) G B,} 

and Qz := -D(flz) = {M G ■ M = D{u) for some u G fix}. For each z G (ZU{—oo})” 
such that fix is non-empty, choose an operator Dz from the closure of Qz such that 
det Dz > det M for all M E Qz (of course, the choice of Dz does not have to be unique). 
Otherwise, if fix is empty then we set Dz ;= min (l, j^l^^^E||'F||i)ld„. Finally, dehne 
a function h : M” -E Vn by setting h{W) := Dz for all "w E Bz and z G (Z U {— 00 })”. 
Observe that h is Borel. Further, by the choice of Hx’s, we have det h{Y) > detH 
everywhere on fl, whence E(det h(y))"“^ < exp((5). Next, by the choice of sets Bz, we 
have ||M(PF)||i < 2||M'(^')lli for an?/two couples (M, W), (M', W') E QzXBz- Together 
with the conditions on D and the dehnition of Hx’s, this implies ||Zlx(hF)||i < ^E||y||i 
for all VF G i?x, whence 

8L 

\\h{W) W\\i < —EllFlIi everywhere on M”. 

Now, taking T := h{Y), we obtain a random diagonal contraction on (fl, S,P) such 
that 

O T 

\\T^/PX\\l = \\TY\\, < ^E||X||P everywhere on fl 
and E(detT)““^ < exp((5). Finally, setting D := we get the required operator. □ 

The above statement can be “tensorized”. In what follows, we are interested only in 
the case p = 2 and a = 1/2. 

Proposition 2.7. There is a universal constant C > 0 with the following property. Let 
A = (atj) be an n X n random matrix satisfying (0, and let 5 E (0,1]. Then there is a 
random positive contraction D taking values in Dn such that the Euclidean norms of the 
rows of AD are uniformly bounded by everywhere on the probability space, and 

EdetZl”^ < exp(5n). 

Proof. Indeed, for any i = l,2,...,n, let Di be the positive contraction dehned with 
respect to the i-th row of A using Proposition 12.11 (with parameters a = 1/2, p = 2), 
so that Hi, D 2 , ..., Dn are jointly independent. Then the product of these contractions 
D := nr=i satishes the required conditions. □ 

Remark 2.8 Jt is not difficult to^see that for any positive contraction M G Dn there is 
an element M E such that M < \/2M and detM“^ < detM“^. Indeed, this follows 
easily from the fact that for any number t E (0,1] there is 1 G {1} U |2“^ with 
<t < \/2t (the constant \/2 on the right-hand side is achieved for t = \/2/2 — o(l)). 
Hence, the above statement implies that, given a matrix A satisfying (jf| and a number 
(5 > 0, one can construct a random contraction D taking values in such that each 
row of AD has Euclidean norm at most (for some universal constant C* > 0), and 

EdetH”^/^ < exp((jn). 
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3 Coverings of random ellipsoids 

The main result of the section is 

Theorem 3.1. Let 6 G (0,1] and let A = (aij) be an n x n random matrix satisfying (Q. 
Then 

p|3il G : det -D > exp(—(5n) and ||AZl||oo ^2 < ^ ~ 4exp(—hn/S), 

where > 0 zs a universal constant. 

Remark 3.2. The above theorem can be seen as a way to “regularize” the random matrix 
A by reducing its norm while preserving its “structure”. In this connection, let us mention 
work [To] where a very general problem of regularizing random matrices was discussed 
(see HDl Section 5.4]). 

As we have mentioned in the introduction, Theorem A follows almost immediately 
from the above statement; we give the proof of Theorem A at the very end of the section. 
The section is organized as follows. First, we use D constructed in Remark 12.81 to verify 
Theorem 13.11 under an additional assumption that the entries of A are symmetrically 
distributed (see Proposition 13.6p . Then, we will apply a symmetrization procedure to 
prove Theorem 13.11 in full generality. 

A random variable f is subgaussian if there exists a number iP > 0 such that 

r{\^\>t} <2exp{-tyK^), t>0. (3) 

To put an emphasis on the value of K, we will sometimes call f iP-subgaussian. We note 
that the smallest value of K satisfying (ED is equivalent to the subgaussian norm of f (see, 
for example, [2^ Lemma 5.5]); however, the latter notion is less convenient for us and 
will not be used in this paper. 

The next lemma is equivalent to a standard Khintchine-type inequality (see, for ex¬ 
ample, 0)- 

Lemma 3.3. Let ri,r 2 , ... ,rn be independent Rademacher random variables. Then for 
any vector y G the random variable UTi is subgaussian, where Qoi > 0 zs 
a universal constant. 

The sum of squares of subgaussian variables has good concentration properties; the 
bound below follows from a standard “Laplace transform” argument (see, for example, 
[2^ Corollary 5.17]): 

Lemma 3.4. For any T > 0 there is L^ > 0 depending on T with the following property: 
Let ^ 1 ,^ 2 , ■ ■ ■ ,^n be independent centered 1-subgaussian random variables. Then 

n 

> %ip} < exp(—Trz). 

i=l 
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The next proposition implies that for a random matrix A satisfying Q with symmet¬ 
rically distribnted entries and the operator D from Remark 12.81 the norm 11 AD|| 00^.2 can 
be efficiently bonnded from above as long as D is a Borel fnnction of |A| (here and fnrther 
in the text, given a matrix B = (%), by \B\ we shall denote the matrix (|%|)). 

Proposition 3.5. Let K > 0 and let A be an n x n random matrix satisfying (Q, with 
symmetrically distributed entries. Further, let B C be any countable subset. Denote 
by £ the event 

£ ;= {3D G D : all rows of AD have Euclidean norms at most Ky/nY 


Then 

P{3D G D : ||AD|1oo^2 < CKn} > ¥{£) - exp(-n), 
where C > is a universal constant. 

Proof. Fix any admissible K and B. Clearly, for any n x n matrix B and a diagonal 
matrix D, the Enclidean norms of rows of BD and \B\D are the same. Hence, we may 
assnme that there is a Borel fnnction / : B snch that 

= {all rows of Af{\A\) have norms at most 

For any D G D, let 

£n-.= £n{f{\A\) = D}. 

Without loss of generality, ^^{£ 0 ) > 0 for any D E B. 

Next, as the unit cube [—1,1]” is the convex hull of its vertices V = {—1,1}"', we have 

||3l/(|A|)|loo^2 = sup ||A/(|A|) 2 /|| = sup ||A/(|A|)n||. (4) 

yeB^ vev 

Note that, given event £d, the entries of A/(|A|) = AD are symmetrically distributed, 
so the distribution of ADv given ££> is the same for any vertex v E V. Fix a vertex v. 
Observe that for any f > 0 we have 

P£-^{||ADn|| > t} < supP{||DDn|| > t}, (5) 

B 

where by P^^ we denote the conditional probability given £d and the supremum is taken 
over all matrices B = (bij) such that the rows of BD have Euclidean norms at most Ky/n, 
and B = {vijbij), with rij being jointly independent Rademacher (±1) variables. Fix any 
admissible B = {bij). 

Then the variables {BDv, e*), i = 1, 2,..., n, are jointly independent and, in view of 
Lemma [3.31 and the choice of D, each variable K~^n -^^^{BDv, Ci) is Qorsubgaussian. By 
Lemma 13.41 there is a universal constant C > 0 such that 

P{||DDn|| > CiCn} = p|-^(DDn,ei)2 > < exp(-(l + In2)n). 

2=1 
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Then, taking a union bound over 2"' vertices of the unit cube and using ([5]) and (jl]), we 
get an estimate 

P£:^{||74Zi)||oo^.2 > CKn} < 2"' • supP{||i?-Dn|| > CKn} < exp(—n). 

B 

Finally, clearly 

F{\\AD \\^^2 > CKn} < P(£=) + > CKn}F{Sn) < P(^‘=) + exp(-n), 

D 

and the result follows. □ 


Proposition 3.6. Let 6 G (0,1] and let A = (aij) he an n x n random matrix satisfying 
Q, with symmetrically distributed entries. Then 


P{3Zi) G : det D > exp{—5n) and ||Ai3||oo^2 < 


Mm I \ 1 


2exp(—(5n/4). 


Proof. Fix any 6 G (0,1]. In view of Remark 12.81 there is a random contraction D taking 
values in such that each row of AD has the Euclidean norm at most and 

EdetD"^/^ < exp(5n/4). Denote by £ the event 


£ := {detD >exp(—hn)}. 


In view of the conditions on D and Markov’s inequality, we have 


> 1 — exp(—(5n/4). 


Hence, by Proposition 13.51 taking D to be the set of all contractions from D\ having 
determinant at least exp(—(5n), we obtain 


P{3D G : detD > exp(—hn) and ||HD||oo ^2 < 


mSH I \ 1 


exp(—hn/d) 


exp(—n) 


for a universal constant > 0 . □ 

For the next lemma we will need the following dehnition (essentially taken from |13]). 
Let S' be a hnite set and d be a pseudometric on S. We say that (S', d) is of length at most £ 
(for some £ > 0) if there is n G N, positive numbers bi, 62 , • • •, with ||(&i, 62 , • • •, &n)|| < £ 
and a sequence (S'fc)^^Q of partitions of S' such that 

1. Ro = 

2 . Sn = {{s}}<jg5; 

3. S'fc is a rehnement of Sk-i for all fc = 1, 2,..., n; 

4. For each k G {1, 2,..., n} and any Q, Q' G S'^ such that Q U Q' is a subset of an 
element of S'^-i, there is a one-to-one mapping cf : Q ^ Q' such that d(s, 0(s)) < bk 
for all s E Q. 
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In particular, the above conditions on Sk imply that all elements of Sk have the same 
cardinality. 


Theorem 3.7 (see m Theorem 7.8]). Let {S,d) be a finite pseudometric space of length 
at most i and let ja be the normalized counting measure on S. Then for any function 
/ ; S' —)■ M satisfying |/(s) — /(s')| < d{s,s') (s,s' G S'j and all t > 0 we have 



fd^i 




Remark 3.8. In [13], the above theorem is formulated for metric spaces. It is easy to see 
that passing to pseudometrics does not change the picture. 

Denote by n„ the set of permutations of [n] := {1, 2,..., n}. 

Lemma 3 . 9 . Let y = {yi,y2, ■ ■ ■ ,yn) o, non-zero vector and v = (^1,^2, • • • ,Vn) be a 
vertex of the cube [—1,1]”. Further, let jx be the normalized counting measure on n„. 
Define a function / : n„ —)■ M as 


fip) ■='^vp^j)yj, pen„. 
i=i 


Then 



f dll 


> t 


< 2 exp 



t > 0. 


Proof. Without loss of generality, we can assume that \yj\ > \yj+i\ (j = 1,2,... ,n — 1). 
Dehne a pseudometric d on n„: for any p,q Elln let 


d{p,q) := \f{p) - f{q)\. 

Further, we dehne a sequence of partitions iJln,k)'k=o let Ilji^o := {nn} and for each 

/c = 1, 2, ..., n, let Yln^k consist of all subsets of n„ of the form 

{p G n„ ; p(l) = ii,p(2) = *2, • • • ,p(fc) = ik] 


for all {ii,i 2 ,... ,4} C [n]. 

Now, let k G {1, 2,..., n} and let Q, Q' G If^^fc be such that Q U Q' is a subset of an 
element of Tln^k-i- Note that there are numbers ii,i 2 -,... ,ik, i'k such that p{j) = ij for 
all j < k and p G Q U Q'] p{k) = ik for all p G Q and p{k) = i'^ for all p G Q'. Dehne a 
one-to-one mapping : Q ^ Q' hj 

■= P{j) for j ^ k,p-\i'f^); (p{p){k) := i[; 0(p)(p"^(4)) := 4- 

For any p G Q, we have 

d(p,0(p)) < 2|pfc| +2|pp-i(i/j| < 4|pfc|, 

with the last inequality due to the fact that p“^(4) ^ k. Thus, the space (n„, d) is of 
length at most 4||p||. Applying Theorem 13.71 we get the result. □ 
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The next statement shall be used in a symmetrization argument within the proof of 
Theorem 13.11 we think it may be of interest in itself. 

Proposition 3.10. Let B = [bij) he a non-random n x n matrix such that the Euclidean 
norm of every row is at most ^/n and such that 



i = 1,2,... ,n. 


Further, let Hi (i = 1,2,... ,n) be independent random permutations uniformly distributed 
on Tin, o,nd denote by B = (bij) the random n x n matrix with entries defined by 


bij 


:=bi 




Then 


1P{||-B||oo^2 < > 1 


for a universal constant Qrm > 0. 


exp (—77.) 


Proof. We will show that for any v G {—1,1}"" we have 


P{||i?n|| > < exp(—n — nIn2) 


for a sufficiently large universal constant and then take the union bound over the 
vertices of the cube. 

Fix any v = {vi,V 2 , ■ ■ ■, Vn) G {—1,1}” and let m be the number of ones in {vi,..., n„). 
Clearly, the random variables {Bv, ef) {i = 1,2,... ,n) are independent. Next, for a hxed 
i, the distribution of {Bv,ei) coincides with that of the variable ft := By 

Lemma 13.91 and in view of the condition on the rows of B, we have 

> r} < 2exp(-^), r > 0. 

Hence, the variables — E,^j) (i = 1,2,... ,n) are C-subgaussian for a universal 

constant C > 0. In view of Lemma [3.41 we get that 

n 

P{^K.-E{y >Cn"} < exp(—n — n In 2) (6) 

i=l 


for some constant C* > 0. Finally, observe that 

n n n 

(E^i) ^ (deterministically), 

2=1 2=1 2=1 


so, applying the estimate 

< y/n 
i=i 
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and ([6]), we obtain 


n 

P{||5n||2 > (2(5+ 2)^2} = > {2C + 2)n^^ < exp(-n - nIn2). 

i=l 


Proof of Theorem \3.1[ Let A be an independent copy of A. Obviously 


□ 


HE 

5=1 


^ij ) — 


i=i 


for every i = 1, 2,..., n. Then, in view of Markov’s inequality, each row of A satishes 

32n 


E- 


E 




i=i 


< 


;32n , 

< A / —^ and 2 _^ ^ij E 
j=i 


5 ^ - (5 

j=i 

with probability at least 1 — <5/16 > exp(—5/8). Denote by 8 the event 

^ 32?7; 

and alj < for alH = 1, 2,..., nj. 


32n 


i=i 


In view of the above, P(T) > exp(—5n/8). Let tti, 7r2,..., vr^ be random permutations 
uniformly distributed on fl^ and jointly independent with A, and denote by i? = {pij) the 
random matrix with entries hij := {hj < n). Then Proposition 13.101 yields 

Ps ||5 ||oo^2 < I A I > l - exp(-n), 

whence, in particular, 

P{ ||-B||oo^ 2 E Q5Tni \/32 / 6n\ 8 '^ > 1 — exp(—n). 

But B is equidistributed with A given 8 , so that 

P{||2||oo^2 < Grm\/32/6n\8\ > 1 -exp(-n). 

Clearly, ||y4D||oo^2 < ||^|1 00^-2 for any contraction D G Vn (deterministically), so we 
obtain for the event 81 ;= {||74D||oo^.2 < Qrmx/ 32/5n for all D G Vn}- 

P(^^i) > (1 — exp(—n))P(T) > ^exp(—5?7,/8). 

Next, the matrix 2 ~^^‘^{A — A) has symmetrically distributed entries, and satishes condi¬ 
tions of Proposition 13.61 Hence, 

p|||(H — H)D||oo ^2 E C^\/2j5 n for some D G with detD > exp(—5n)| 

> 1 — 2 exp(—5n/4). 
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Conditioning on £i, we get 

pjlKA — A)D\\ao ^2 < Qoi\/ 2/6n for some D G with det-D > exp(—(Jn) | £^i| 

^ ^ 2exp{—5n/A) 

- P(£0 

> 1 — 4exp(—(5n/8). 

Note that, given 8i, we have || AZi)||oo ^.2 < II“ ^)-D||oo ^2 + Qnm\/ '3>2l5n for all con¬ 
tractions D G Vn- Combining this with the last formula, we obtain 

P'^ II ^40IIoo_,.2 ^ 2/(5 n + Q3:toi \/32/ <5 n 

for some D G with det D > exp(—(5n) | > 1 — 4exp(—(5n/8). 

Finally, since A is independent from 8i, the conditioning in the last estimate can be 
dropped, and we obtain the statement. □ 


To complete the proof of Theorem A, we will need two more technical lemmas: 
Lemma 3.11. For any 6 G (0,1/2] and all n eN we have 

( 9g \ 45 ti 

—j 

Proof. Denote S := {D G : det D > exp(—(Jn)}. Note that for any matrix D E S and 
for any k > 0, the number of diagonal elements of D equal to 2“^*" is less than 2~^~^^Sn. 
Hence, the cardinality of S can be estimated as 


CX) 


5 i<n 

k=0 


n \ 
[2~^+^6n]) 


/p\ 2-'=+l<5n ,,, 

<n(£) 



2e\ 4<5n 

t ) 


□ 


Lemma 3.12. For any n G N and K E [2, 2y/n\, the unit Euclidean ball Bf can be covered 
by at most translates of the cube 

Proof. First, note that for any y E Blf we have 


I* < : \yi\ > 


K 


2-y/n J 


< 


4n 


Hence, it is sufficient to show that the set \{y E Bf : |supp(|/)| < -^}| can be covered 
by at most translates of A simple volumetric argument, together 

with an estimate Vol(i?™) < implies that Bf^ can be covered by at most 7™ 

translates of -^B"^ (for any m G N). Asa consequence, we obtain a covering of ^ 

by at most translates of Finally, the cardinality of the optimal covering 

oi\{y E Blf -. |supp(|/)| < ^}\ can be estimated from above by 


n 


\An/K‘^] 


7\^n/K^^ < (^2eK^) 


2\Sn/K^ 


□ 
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Proof of Theorem A. Let 6 G (0,1/4] and n > ^. First, applying Lemma 13.121 with 
K = l/y/6, we see that can be covered by (2e/(5)®"'^ translates of the dilated cube 
Let 

Q = {D e ■. det-D > exp(—hn)}. 


1 nn 


Then, in view of Lemma [3.111 we get that can be covered by at most (2e/5)^^"' exp((5?7,) 
parallelepipeds in such a way that for any y G and D E Q, y is covered by a translate 
of D{B^). Combining the two coverings, we get a collection C of parallelepipeds covering 
B 2 such that 


\C\ < (2e/(5)'^^”exp((5n) • (2e/(5)®”''^ = exp((5n + 12n(51n—), 

0 

and for any y G B 2 and D E Q, the set C contains a translate of covering y. 

Finally, applying Theorem 13.11 we get that with probability at least 1 — 4exp(—(Jn/S) for 
some D E Q we have AD (Biff) C implying 

P|V xEBf 3 P eC such that x E P and A{P) G Ax + 

> 1 — 4exp(—5n/8). 

(the multiple “2” in the last formula appears because the translation —Ax + A{P) is not 
origin-symmetric in general). □ 

Proof of Corollary A. Fix n and 5, and let C be the collection of parallelepipeds dehned 
in Theorem A. For each P E C, choose a point yp E P H Blf, and let Af := {yp : P E C}. 
Then, clearly, 

\Af\ = \C\ < exp{6n + 12n5\n ^), 

0 

and with probability at least 1 — 4exp(—hn/8) for every x E Bf there is y = y{x) E M 
with —Ax + Ay E In short, 

F{A{Bf)c IJ (|/ + ^i?2”)}>l-4exp(-5n/8). 

y&A{N) 


□ 


4 The smallest singular value — Preliminaries 

As we already mentioned in the introduction, the proof of Theorem B heavily relies on 
results obtained by Rudelson and Vershynin in papers [20] and [T9|. In this section, we 
will state several intermediate results from those papers that we will need in Section [5] to 
complete our proof. 

A crucial step in the proof of [2CT1 Theorem 1.2] is a decomposition of the unit sphere 
into sets of “compressible” and “incompressible” vectors. 
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Definition 4.1 (Sparse, compressible and incompressible vectors). Fix parameters 0,p E 
(0,1). A vector x eMA is called On-sparse if |snpp(a:)| < 9n. A vector x E S"'~^ is called 
compressible if x is within Euclidean distance p from the set of all 6n-sparse vectors. 
Otherwise, x will be called incompressible. The set of all compressible unit vectors will 
be denoted by Comp„(0,p), and the set of incompressible vectors — by lncomp„(0,p). 
Sometimes, when the dimension n or the parameters 6, p are clear from the context, we 
will simply write Comp, Incomp to denote the sets. 


Remark 4.2. A similar decomposition of the nnit sphere was already introdnced in an 
earlier paper mi for the pnrpose of bonnding the smallest singnlar valne of rectangnlar 
matrices. 

Obvionsly, for any e > 0 we have 


[sn(A) < en < 


inf \\Ay\\ 

yEComp 


<£n-i/n + 


inf \\Ay\\ 

^Gincomp 




Treatment of the compressible vectors is simpler dne to the fact the the set Comp is 
“small”; we will deal with this set in the first part of Section |S1 Let us remark that, unlike 
in the subgaussian result of [20] , where an estimate for compressible vectors follows almost 
directly from an analogue of Lemma 14.91 (see below) together with a standard covering 
argument, in our case we will still need to use additional results (proved in Section |3|) as 
the norm ||A|| 2^2 may be “too large”. We will need the following simple lemma: 

Lemma 4.3. For any 0, p E (0,1] the set Comp = Comp,^(6', p) admits a Euclidean “ip-net 
M C Comp of cardinality lA/"] < (e/0)®”(^)^"’. 

Proof. Note that the definition of Comp implies that for any y E Comp there is y' E 
such that |supp(|/')| < 9n and \\y — y'\\ < 2p. Hence, it is enough to show that one can 
find a Euclidean p-net J\f on the set of 6'n-sparse unit vectors, with the required estimate 
on |AA|. This follows from a standard estimate on the cardinality of an optimal p-net on 
together with a bound for the binomial coefficient Cl 

Incompressible vectors have the important property that a significant portion of their 
coordinates are of order In paper [20], this property was referred to as “incom¬ 

pressible vectors are spread”. For reader’s convenience, we provide a proof of this fact 
below (let us note once again that analogous concepts were already considered in [TT]). 

Lemma 4.4 ([20] Lemma 3.4]). For any 9, pE (0,1) and for any vector x E lncomp„^{9, p) 
there is a subset of indices a(x) C {1, 2,..., n} of cardinality at least ^p‘^9n such that for 
all i E cr(x) we have 

p 1 

,_ <Xi< , _ 

\/^ 

Proof. For every subset / C {1, 2,..., n}, let Pi be the coordinate projection onto the 
span of {cj : i E I}. Let a = a{x) := ai fl (T 2 , where 


CTl = 


< n : 



0-2 = 


|i < n : 
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Since ||a;|| = 1, we have |(jJ| < 9n, and Pa^{x) is a 6*?7,-sparse vector. Then the condition 
that X is incompressible implies ||Po-i(a:)|| = Hx — Po-j(x)|| > p. Hence, 

WP^xW > WP.AxW - > p" - n ■ \\P.^,{x)\\l > pV2. (7) 

On the other hand, in view of the inclusion a{x) C ui, we get 

\\Pa{x)f < \\Pa{x)\\l, • |cr| < ^ • |a|. (8) 

Together ([7]) and (0) imply that |cr| > ^p^9n. □ 

For incompressible vectors we will need the following basic estimate from |20] . 

Proposition 4.5 f [20l Lemma 3.5]). Let M be a random nxn matrix with column vectors 
..., X'^, and let Hj (j = 1,2,... ,n) be the span of all column vectors except the 
j-th. Then for every e > 0 we have 

1 "" 

, inf \\My\\ < epn-^^} < — ^P{dist(X^P^) < e}. 

yGlncomp(^,p) uTl 


In view of independence and equi-measurability of the columns of A in our model, the 
above proposition yields for any e > 0: 


inf 

pGlncomp(0,p) 


\\Ay\\ 


< epn < 


1 

2=1 


< 5 


where X* = {X^,X 2 ,... ,Xf) denotes a random normal unit vector to the span of the 


hrst n — 1 columns of A. Obtaining small ball probability estimates for 


j:x:a, 


2=1 


was a 


crucial ingredient of 

Given a real-valued random variable f, define its Levy concentration function is 


£(^, 2 ;) := supP{|^ - A| < 2 r}, z>0. 

AeR 

First, let us look at some well known estimates of v) and then state a stronger bound 
from [20] . 

Theorem 4.6 (Rogozin, [IS])- Let n G N, let fi, ^ 2 , ■ ■ ■ ,^n jointly independent random 

variables and let ti,t 2 , ■ ■ ■ ,tn be some positive real numbers. Then for any t > m&xtj we 

j 

have 

n n -1/2 

i=i i=i 

where > 0 /s a universal constant. 

Obviously, if f is essentially non-constant, there are n > 0 and u G (0,1) such that 
The following lemma is an elementary consequence of Theorem 14.61 (see [TTl 
Lemma 3.6] and [201 Lemma 2.6] for similar statements proved under additional moment 
assumptions on the variable). 


19 









Lemma 4.7. Let ^ be a random variable with C{^,v) < u for some u > 0 and u G (0,1). 
Then there are u' > 0 and u' G (0,1) depending only on u,v with the following property: 
Let ^ 1 , ^ 2 , ■ ■ ■ ,in be independent copies of f. Then for any vector y G we have 

n 

Proof. By Theorem 14.61 for any y G 5'”“^ and any h > max \yj\v, we have 


i=i 


h\/l — u 


Dehne P := and consider two cases. 

1) For every j = 1,...,n we have \yj\ < Then n' > max \yj\v, and we obtain from 

the above relation 

n . 

- 2 ' 
i=i 


2) There is jo such that \yjg\ > Then we get 


i=i 


Thus, we can take u' := max(l/2,M). □ 

Lemma 4.8 (“Tensorization lemma”, m Lemma 2.2]). Let ai, a 2 ,... ,an be i.i.d. ran¬ 
dom variables, and let £o > 0. 


• Assume that 


Then 


C{ai, e) < Le for some L > 0 and for all e > Eq. 

n 

p{ < [CLsY for all e > Eq, 


i=i 

where C > 0 is a universal constant. 

Assume that C{ai,v') < u' for some v' > 0 and u' G (0,1). Then there are n > 0 
and u G (0,1) depending only on u',v' such that 


n 

p{ ^ a/ < nrij < u^' 


i=i 


As a consequence of Lemmas 14.71 and 14.81 we get 
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Lemma 4.9. Let a be a random variable with C{a,v) < u for some u > 0 and u G (0,1). 
Then there are u > 0 and u G (0,1) depending only on u,v with the following property: 
Let A be an n X n random matrix with i.i.d. entries eguidistributed with a. Then for any 
y G 5'"'“^ we have 

P{||^|/|| < u-v/n} < m"". 

Remark 4.10. Lemma |4.9| can be compared with m Proposition 3.4] and [201 Corol¬ 
lary 2.7]; however, those statements were proved with additional assnmptions on the 
entries of A. 

To get a stronger estimate than the one obtained in Lemma 14.71 the following notion 
was developed in |20] and [19] (see also preceding work [22] by Tao and Vu). 

Definition 4.11 (Essential least common denominator). For parameters r G (0,1) and 
h > 0 and any non-zero vector x G M", define 

LCD/i ,.(2^) := inf{f > 0 : dist(te,Z"') < min(r||te||, h)}. 

We note that later we shall choose r sufficiently small and h to be a small multiple of 
^/n. Thus, most of the coordinates of LCD/^ ^.(a;) -x are within a small distance to integers. 
For a detailed discussion of the above notion, we refer to nn. 

The next statement is proved in [T9] . 

Theorem 4.12 f [T9l Theorem 3.4]). Let .^i, ^ 2 , ■ ■ ■ ,^n be independent copies of a centered 
random variable such that < u for some v > 0 and u G (0,1). Further, let 

X = {xi,X 2 , ■ ■ ■ ,Xn) G S"‘~^ be a fixed vector. Then for every h > 0, r G (0,1) and for 
every 

1 

LCD;,,,(x)’ 

we have 

n ^ 

c('^Xifi,ev) < — C^exp(-2(1 - m)^^), 

/ rvl — u 

1=1 ^ 

where Qjrr^ is a universal constant. 

Thus, in order to get a satisfactory small ball probability estimate for the inhmum 
over incompressible vectors, it is sufficient to show that the random normal X* has expo¬ 
nentially large LCD with probability close to one. This will be done in the second part 
of Section [5] As for the set Comp, our treatment of the random normal will be based on 
results of Section El 

5 The smallest singular value — proof of Theorem B 

In this section we give a proof of Theorem B stated in the introduction. Let us start with 
a version of Theorem A more convenient for us: 
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Theorem A*. Let 5 G (0,1/4], n > e G (0,1/2], S C 5'"'“^, and let Af C S 
be a Euclidean e-net on S. Then there exists a (deterministic) subset Af (Z S with 
l-A/"! < exp(l3(5n In lA/"! such that for any n x n random matrix A satisfying (0, 

with probability at least 1 — 4exp(—(5n/8) the set Af is a {^y/n)-net on S with respect 
to the pseudometric d{x,y) := ||A(x — |/)|| (x,y G where (7* > 0 zs a universal 

constant. 

Proof. Fix parameters n and 5, and let C be the collection of parallelepipeds from Theo¬ 
rem A covering Blf. Dehne a set C := [eP + y : P e C, y e Af, S' fl {eP-\-y) ^ 0} and for 
every P G C let yp be a point in the intersection S' fl P. Finally, set AA := {yp : P G C}. 
Informally speaking, C is a “prodnct” of the rescaled collection e • C and the net Af. For 
each parallelepiped in C having a non-empty intersection with S', we take one (arbitrary) 
point from this intersection to constrnct the refined net Af. What remains is to check that 
with high probability Af is indeed a (^\/n)-net on S with respect to the psendometric 
d{x,y) := \\A{x-y)\\. 

Observe that 

\ff\ = \C\ < \C\ ■ lA/"! < exp(l3(5nln-^) lA/"]. 

Next, let A be an n X n random matrix satisfying (j^, and define event S as 

S ;= |v x G 3P G C snch that x G P and A(P) C Ax -A P 2 |- 

By Theorem A, we have P(S) > 1 — 4exp(—(Jn/S). 

Fix any point a; G S'. By the definition of Af, there is a vector y E Af snch that 
e~^{x — y) E Blf. Hence, for any point cu G S on the probability space, there is a 
parallelepiped P = P{oj) E C snch that e~^{x — y) E P and 

A.(P) C A^{e-\x - y)) + ^Blf. 

Note that S'fl {eP + y) D {x} 7 ^ 0, whence P := eP + y E C, and, from the above relation, 

A^(P) C A^x + 

whence 

yp 

where yp G Af. We have shown that 

£ C |Vx E S 3y = y{x) E Af snch that ||A(x — t/)|| < 

and the resnlt follows. □ 

Remark 5.1. Let ns note that a weaker version of Theorem A*, with condition Af Z S 
dropped, can be proved by applying Corollary A instead of Theorem A. 


- C 


eC^/n 


TDn 

^2 5 
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At this point, a significant part of onr argnment follows the same scheme as in [20] , 
In the hrst part of this section, we are dealing with compressible vectors. 

Proposition 5.2 (Compressible vectors). Let a be a centered random variable with unit 
variance such that £(a,n) < u for some v > 0 and u G (0,1). Then there are numbers 
> 0 ^ (0; 1) depending only on v, u with the following property: Let n G N 

and let A be an n x n random matrix with i.i.d. entries eguidistributed with a. Then for 
Comp = Comp„(6^ 6^ we have 

inf \\Ay\\ < 

f/GComp 

Proof. Withont loss of generality, we can assnme that n is large. First, note that by 
Lemma 14.91 we have a strong probability estimate for any hxed nnit vector: there are 
n > 0 and u G (0, 1) depending on v, u snch that for any y G 5'"“^ we get 

P{||A|/|| < v^/n} < u^. (9) 


In order to obtain a nniform estimate over a set S' = Comp„(0,6') for some small pa¬ 
rameter 6, we will take a net J\f <Z S constrncted in Lemma 14.31 and refine it with the 
help of Theorem A* to get a net Af with respect to pseudometric \\A{x — ?/)||. We will 
apply Theorem A* with parameter 5 dehned as the largest number in (0,1/4] so that 
exp(l3(5?7,ln y) < Let us describe the procedure in more detail. 

First, dehne parameter 6^ G (0,1/6] as the largest number satisfying the inequalities 


6n 


(^] <u and 


30 ( 7 * V 

- ^ < -. 

5 - 2 


Let S be as above. By Lemma 14.31 there is a 30-net A/" C S' on S' (with respect to the 
usual Euclidean metric) of cardinality lA/"] < (ff)^"". Now, by Theorem A*, there is a 
deterministic subset Af (Z S having the following properties: 

• lA^I < exp(l3(5nlnf) ■ |AA| < 


• With probability at least 1 — 4exp(—(5n/8) for every y E S there exists x{y) G Af 
such that 

\\A{x -y)\\< —-—\/n < -^/n. 


Applying the union bound over Af to relation ([9]), we get 

P{||A|/'|| < n-v/n for some y' G Af} < \Af\u"' < . 

On the other hand, the second property of Af implies that 



inf llAyll < inf \\Ay\\ 

y&S yfzj^ 


^ < 4exp(—5n/8). 


Combining the two estimates, we get 

P{||A?/|] < v\/n/2 for some y E S} < -|- 4exp(—(5?7,/8), 

and the result follows with := max{M^/^, exp(—5/8)}. 


□ 
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Remark 5.3. It is not difficult to see that Proposition 15.21 can be stated and proved in the 
same way for A which is not square, but instead is an n — 1 x n matrix with i.i.d. entries 
equidistributed with a. Indeed, for n large enough we can assume that 7 -n < (n — 1) < n 
for 7 as close to one as we want (the values of 6^ and may differ in that case). 
This will be important for us later. 

Remark 5.4. Proposition 15.21 could be proved by a completely different argument based 
on [23 Proposition 13] and not using results of Section |3] at all. However, we prefer to 
have a “uniform” treatment of both compressible and incompressible vectors. 


Let us turn to estimating the inhmum over incompressible vectors. As we already 
discussed in Section 01 it suffices to show that the random unit normal vector to the span 
of the hrst n — 1 columns of A has exponentially large LCD with probability very close 
to one. This property is verihed in Theorem 15.91 below. We start with some auxiliary 
statements. First, note that Theorem 14.121 together with Lemma 14.81 imply that anti¬ 
concentration probability for a single vector can be estimated in terms of the LCD of 
the vector. Namely, the bigger LCD(a;) is, the less is the probability that the image Ax 
concentrates in a small ball; 


Lemma 5.5 (Small ball probability for a single vector; see [201 Lemma 5.5]). Let h > 0, 
r G (0,1) and let a be a random variable satisfying C{a,v) < u for some v > 0 and 
u G (0,1). Then there is L^> 1 depending only on v,u with the following property: Let 
A' be an n — 1 X n random matrix with i.i.d. elements eguidistributed with a. Then for 
any vector x G S"‘~^ and any 


we have 


P{||A'x|l < e^] < {L^/r 


.n— 1 


Proof. Fix any vector x G S'" ^ and denote Y = (Yi,Y 2 ,..., W-i) := A'x. Note that, in 
view of Theorem 14.121 we have 


C{Yi,e) < -^^= + C,^exp(-2(l - u)h‘^) < 

rv 1 — u rv 1 — u 

for any e satisfying conditions of the lemma. Hence, by Lemma 14.81 

n-l , —-IN, 


i < n, 


i=l 


< 


C"(l -f- 


'•'/T 


u 


□ 

The above statement is useful for incompressible vectors: the following Lemma 15.61 
shows that incompressible vectors have LCD at least of order ^/n. The lemma is taken 
from papers [20l 09] , and its proof is included for completeness. 
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Lemma 5.6 (see m Lemma 3.6]). For every 0,p e (0,1) there are = q^0,p) > 0 
and r[ 5 j = r^^9, p) > 0 such that for every h > 0 any vector x G lncomp„(0, p) satisfies 

LC'Dh,^{x) > qs3[s/n. 

Proof. Set a := ^p‘^6 and b := pf \/2. We choose r = r^ := = \p^\f9 and q = : = 

(l/y0 + f)"' = V0/3. 

Let X G Incomp„(6', p), h > 0 and assume that LCD/j ,.(3;) < q^/n. Then, by dehnition 
of least common denominator, there exist p G Z" and A G (0, qy/n) such that 

||Aa: — pII < rX < rqy/n = -p^9^/n = -a\/n. (10) 

6 3 

It is easy to check that for a vector with such norm the set 

d'{x) := |z < n : \\xi — pf < 2/3} 

has a cardinality at least (1 —^)n. Further, by Lemma ITTl the set of “spread” coordinates 
a{x) has cardinality at least an. Hence, the set I{x) := a{x) fl a(x) is non-empty, and 
\I{x)\ > |n. For any i E I{x) we have 

I I M I 2 q 2rq 

\pi\ < A Xi + -<^ +-= 1 

3 y/9 a 

(in the last step we used our dehnition of q). Since p E IF, we get that Pi = 0 for all 
i E I{x). 

Finally, due to the dehnition of I{x) and our choice of r, denoting by Pj the coordinate 
projection on a span {i E J : 6*}, we obtain 

||Ax -pll^ > ||AP,(x)f > A^|/(x)|d = = (,A)^ 

which contradicts flTOll and, hence, the assumption that \jCP)h,r{,x) < qx/n. □ 

Let n G N, h > 0, 0, p G (0,1), and let and r^ be as in the above statement. 
Following [20], we consider the “level sets” Sk of Incomp„(6*, p) dehned as 

Sk = Sk{9,p,h) := [x E Incomp„(6>,p) : k < LCD/,,,[g 2 ](^) < 2^}, A; > 0. 

In the proof of the theorem below we will partition lncomp„(0,p) into subsets of vectors 
having LCD’s of the same order: 

Incomp„(6»,p) = |J Sk, (11) 

k=2^,i>io 

where, using Lemma 15.61 we introduce the lower bound io := log2(q53]\/n/2) (we have 
Sk = 0 for all k < q^:^^Jn/2). Following |20], we are going to combine estimates for 
individual sets Sk- 

A principal observation made in [19] and 121 is that the sets Sk admit Euclidean e-nets 
of relatively small cardinality. We give both the formal statement and its proof from [T9] 
below for the sake of completeness: 
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Lemma 5.7 ([191 Lemma 4.8]). For any 0,p E (0,1) there is L = L[6,p) > 0 such that 
for every h> 1 and k > 0 the set Sk admits a Euclidean {Ah/k)-net of cardinality at most 
[kL/^/nY. 

Proof. In view of Lemma 15^ we can assume that k > Further, without loss of 

generality ^ < 2; otherwise a one-point net works. 

Fix for a moment a point x G S^- Then, by dehnition of the “level sets”, k < 
LCDfe^T^(a:) < 2k. By dehnition of LCD, there exists p = p{x) G IF such that 

II LCDfe,,jg2](^) - Pll 


Hence, 


X 


P 


LCD, 


<l<i 

- k 2 


It is a simple planimetric observation that if we normalize the vector p/ LCDfe^T[g^(x), the 
distance to the unit vector x cannot increase more than twice: 

P 


X 


\\P\\ 


2h 

< —. 
- k 


Thus, the set 


Ffinf, • 


1 77 ^ • P = some X E Sk 

I lb|i 


is a 2h/k-net for Sk- How many different p E IF we have to consider? Note that for 
any x G Sk, the norm of p{x) cannot be too large: since ||a;|| = 1, LCD/t^,^(a:) < 2k and 
Ah/k < 2, we get 

||p(a;)|| < LCBh,-,^{x) + h < 3k. 

Hence, all vectors p G in the dehnition of Afint belong to the Euclidean ball of radius 
3k centered at the origin. Standard volumetric argument shows that there are at most 
(1 -|- Ck/y/n)'^ integer points in this ball for a sufficiently large constant C > 0. Recall 
that k > whence 


j.'^ntl ^ 


Ck\n 
1 + — < 
nJ 


kL\^ 

7 ^) 


for an appropriate number L = L{6,p) > 0. The net Afint does not have to be contained 
in Sk- But, by a standard argument, we can “replace” Afint with a 4h/fc-net of the same 
cardinality, and with elements from the set Sk- □ 

Together with Theorem A*, the above lemma gives 

Lemma 5.8. For any 9, p E (0,1) there is L^= L^O, p) > 1 such that for every h>l 
and k > 0 there is a finite subset Af C Sk of cardinality at most {kL^ ^/n] with the 
following property. The event 


{For every y E Sk there is y' = y'{y) E Af such that \\A{y — y')\\ < /kf^ 

has probability at least 1 — 4exp(—n/32). 
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Now, we can prove 


Theorem 5.9. Let a be a centered random variable of unit variance such that C{a,'v) < u 
for some n > 0 and u G (0,1). Then there exist q,s,w,r > 0 depending only on v,u 
with the following property: let X'^ ..., X'^~^ be random n-dimensional vectors whose 
coordinates are jointly independent copies of a. Consider any random unit vector X* 
orthogonal to {X^,X‘^, ..., Then 

< exp(gn)} < 2exp(-M;n). 

Proof. Without loss of generality, we can assume that n is a large number and that v <1. 
Denote by A' the n — 1 x n matrix with rows X^, X ^,..., X"'~^. Then, by the dehnition 
of X*, we have A'X* = 0 almost surely. Let 6^ and be dehned as in Remark 15.31 
(with A' replacing A). Then, by Proposition 15.21 and Remark 15.31 we have 

P{X* G Comp„(%^%^} < < exp{-wn) 

for tc > 0 such that, say, exp(—2tc) > and provided that n is large. Thus, it is 
enough to prove that 


LCD^^,.(X*) < exp(gn),X* G Incomp„(%; 2 ], < exp(-M;n) 

for small enough r, w, s, q depending only on n, u. We start by dehning r := r^^6^ 
Note that, by Lemma [5.61 we have 


Incomp„(%^%^ C {x G ^ : LCD, 




[X, 




for any s > 0, and, in particular for s dehned by s := where L^ = 

and are taken from Lemmas 15.81 and lR5l respectively, and q^ = fei). Let us 

emphasize that no vicious cycle is created here in regard to interdependence between s 
and r. Finally, we let q := 2s^(l — u) {w will be dehned at the very end of the proof). 

We will make use of representation fITT]) of the set Incomp„(%^ Denote 

/C := { 2 *: i G [log2(qsirA) - 1, Wln2] riN}. 

Then, in view of Lemma 15.61 we have 


{x G Incomp„(%; 2 i, ; LCD^^,.(a:) < exp(gn)} C |J Sk- 

k&K 


It is sufficient to prove that 


P{X* G Sk} < 5exp(—n/32) for all fc G /C. (12) 

Indeed, since |/C| < qn, the union bound over /C will conclude the theorem. 

In turn, flT^ will follow as long as we show that 

F{A'x = 0 for some x G Sk] < 5exp(—n/32) for all /c G /C. 
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Fix for a moment any k E JC and let A4 be the subset of Sk of cardinality at most 
{kL^ ^/nY, constructed in Lemma 15781 (with h := Sy/n). Further, take e := 

Note that, in view of the dehnition of q and /C, we have k < exp(2s^(l — u)n). Hence, for 
n large enough, e satishes the condition of Lemma 15.51 


/I 

e > V ■ max —, exp 
\k 


( - 25^(1 - u)n)^ > V ■ exp(-2(l - u)hY^. 


Hence, 


P{||^'2/|| > eVn for all y G A4} > 1 - |A4|(%5L/’^) 


n—1 


> 1 - 

> 1 - 
> 1 - 




n) [^r 

klwm 
y/n V2/ 

2-"exp(2s2(l -M)n), 


where the last relation follows by the assumption u < 1. Finally, note that, since s < 1/4, 
the last quantity is bounded from below by 1 — 2“"'/^. Applying the dehnition of A4 in 
Lemma 15781 and noticing that hL^^^^Jn/k < ey/n/2, we get 

1P{||A'|/|| > ey/n/2 for all y G Sk} > 1 — 4exp(—n/32) — 2“”/^ > 1 — 5exp(—n/32). 

This proves (IT^ and implies the result. □ 

Proof of Theorem B. Without loss of generality, the dimension n is large. Let A = (aij) 
be an nxn random matrix with i.i.d. centered entries with unit variance such that for some 
u > 0 and u G (0,1) we have C{aij,v) < u. We dehne 9 := 9^v, u) and v := y^v, u), 
where 6^ are taken from Proposition 15.21 and let q, s, w, r be as in Theorem 15.91 fwith 
respect to v,u). We will prove a small ball probability bound for s„(A). 

It is sufhcient to consider the parameter domain e G (0uexp(—gn), l]. We have 

P{s„(A) < < P{ inf \\Ay\\ < u\/n| + P{ inf \\Ay\\ < 

< + P{ inf \\Ay\\ < 

2 /G Incomp (0,0) 

where we have applied Proposition 15.21 Further, by Proposition 14.5[ we have 


inf \\Ay\\ < en < ip<{ 

ygIncomp^(6,6) 9 I I ' ^ 


i=l 


< 


9 


where X* denotes a random unit normal vector to the span of the hrst n — 1 columns of 
A. In view of Theorem 14.121 this last relation implies 

inf \\Ay\\ < en-^/2} < 0-^P{LCD,^_,(X*) < 9Te-^} 

j/eIncomp,j(6,e) 


+ q^exp(-2g^(l - u)n). 

9vryl — u 
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Finally, noticing that 9ve ^ < exp(gn) and applying Theorem 15.91 we get 


inf ||y4|/|| < < 26* ^ exp(— tan) + — + C^^exp(— 2s^(l — m)?7,) . 

yeincomp^{g,e) Ovryl — u 


Together with an estimate for the compressible vectors, this implies the result. 


□ 
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